Analysis of two-gene signatures and related drugs in small-cell lung cancer by bioinformatics

Abstract Small-cell lung cancer (SCLC) has a poor prognosis and can be diagnosed with systemic metastases. Nevertheless, the molecular mechanisms underlying the development of SCLC are unclear, requiring further investigation. The current research aims to identify relevant biomarkers and available drugs to treat SCLC. The bioinformatics analysis comprised three Gene Expression Omnibus datasets (including GSE2149507, GSE6044, and GSE30219). Using the limma R package, we discovered differentially expressed genes (DEGs) in the current work. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes analyses were made by adopting the DAVID website. The DEG protein–protein interaction network was built based on the Search Tool for the Retrieval of Interacting Genes/Proteins website and visualized using the CytoHubba plugin in Cytoscape, aiming to screen the top ten hub genes. Quantitative real-time polymerase chain reaction was adopted for verifying the level of the top ten hub genes. Finally, the potential drugs were screened and identified using the QuartataWeb database. Totally 195 upregulated and 167 downregulated DEGs were determined. The ten hub genes were NCAPG, BUB1B, TOP2A, CCNA2, NUSAP1, UBE2C, AURKB, RRM2, CDK1, and KIF11. Ten FDA-approved drugs were screened. Finally, two genes and related drugs screened could be the prospective drug targets for SCLC treatment.


Introduction
Lung cancer refers to one of the most commonly seen cancers globally, showing high morbidity and mortality rates.Each year, around 2.2 million new cases of lung cancer as well as over 1.8 million lung cancer deaths are reported across the world [1].Small-cell lung cancer (SCLC) is considered a type of lung cancer.It occupies 15% of all lung cancer-related deaths.Most SCLC patients exhibit systemic metastases at the time of diagnosis.As a result, its 5 year survival rate is around 5% [2,3].Chemotherapy for SCLC frequently fails because SCLC is drug-resistant, which further deteriorates therapeutic outcomes [4].On the other hand, for the immune surveillance mechanism of SCLC, although the recent immune insertion point blockers for SCLC patients have brought hope for the treatment of SCLC, it only benefits a small number of SCLC patients, not for most of SCLC patients [5].Therefore, it is essential to develop efficient diagnostic techniques and treatment strategies for SCLC patients.
High-throughput genome sequencing has enabled significant advancements in the diagnosis and therapy of cancer [6].Following the analysis of clinical and molecular sequencing data, bioinformatic methods can provide new ideas for understanding cancer development.To date, with the development of bioinformatics, there are many studies on SCLC, not only on target genes [7][8][9] but also on noncoding RNA (ncRNA) [10], and genome-wide studies on SCLC [11].Although the current research results have enabled us to further understand the molecular level of SCLC, it is still not effective for studying the biological process of SCLC.The molecular mechanisms of SCLC have not been completely illustrated.
The term "drug repositioning" refers to the process of using an FDA-approved drug to treat a disease or condition that is beyond its current indication [12].The development of new antineoplastic drugs has stalled because of the high cost and time to market, as well as drug toxicity and therapeutic effects [13]."Drug repositioning" has inspired the use of novel approaches to cancer treatment [14].For example, disulfide, a drug used for treating alcoholism, has been discovered to exhibit antitumor activity against non-small cell lung cancer (NSCLC), liver cancer, breast cancer, prostate cancer, pancreatic cancer, glioblastoma, as well as melanoma [15].Another example is chlorpromazine, a high-dose antipsychotic drug approved by FDA as an antineoplastic drug [16].Therefore, we hypothesized that FDA-approved drugs could be tested using bioinformatic techniques to develop novel antineoplastic drugs for SCLC.
As the molecular regulation is still unknown, the therapeutic effects of drugs are limited.Therefore, it is necessary to detect biomarkers and drugs to treat SCLC.In the current work, bioinformatics analysis was adopted for discovering promising biomarkers and available drugs for SCLC.We selected three microarray datasets from the Gene Expression Omnibus (GEO) database for analysis and also identified differentially expressed genes (DEGs) between the SCLC groups and normal groups.We further performed Gene Ontology (GO) annotation, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, as well as protein-protein interaction (PPI) analysis.Finally, the possible biomarkers were identified, and potential drugs related to the treatment of SCLC were screened.Figure 1 presents the workflow of the current study.

SCLC dataset
A large amount of gene expression data, such as microarray and high-throughput data, are stored in the GEO database [17].GSE149507, GSE6044, and GSE30219 were downloaded from the GEO database.In addition, the platform adopted for the microarray dataset was GPL23270 (Affymetrix Human Genome U133 Plus 2.0 Array).GSE149507 includes 36 samples, among which 18 are tumor tissue samples, with 18 being normal tissue samples.GSE6044 includes nine SCLC tissue samples and five normal lung tissue samples.There are 21 lung SCLC samples and 14 non-tumoral lung samples in the GSE30219 dataset.

Identification of DEGs
For the purpose of identifying DEGs, we adopted the limma R package.Genes with |logFC| > 2, and p-value < 0.05 were regarded to be DEGs.Genes with downregulated expression in DEGs were assigned logFC < −2, and genes with upregulated expression were assigned logFC > 2. Venn software was used to filter the overlapping DEGs in the three sets of data.

Biological function analysis and pathway enrichment analysis
As an online data analysis website, DAVID (https://david.ncifcrf.gov/) was adopted for performing the GO and KEGG pathway enrichment analysis [18].Statistical significance was detected at P < 0.05.

PPI network construction and selection of hub genes
The STRING database (http://string-db.org)integrates various proteins to construct their interaction networks [19].
In the current work, we created a network of interacting DEGs using STRING software.The interaction network created by DEGs was visualized based on Cytoscape software (http://www.cytoscape.org/)[20].In addition, the top ten hub genes (scores > 2) were screened with the Hubba plugin in Cytoscape [21].

Screening of existing drugs
In this study, the QuartataWeb (http://quartata.csb.pitt.edu/)integrates, organizes, and displays drug-gene interactions and gene-pharmaceutical information from the stick and drug bank [22].Through the database and support from previous literature, the top ten hub genes were screened for similarities to existing or failed FDA-approved drugs.

Cell culture
Normal human lung cell line (HLF-a) and human typical SCLC cell line (NCI-H1688) were purchased from Procell (Wuhan, China).Procell offered all cells and their special culture medium.In addition, all the cells were cultivated at 37°C in a humid environment with the concentration of 5% CO 2 and were exposed to STR profiling.

RNA extraction and quantitative realtime polymerase chain reaction (qRT-PCR)
Using TRIzol reagent, total RNA was isolated from HLF-a and H1688 cells (Invitrogen, CA, USA).By adopting a PrimeScript Reverse Transcriptase Reagent Kit, we performed reverse transcription (RT) of complementary DNA (cDNA) (TakaRa, Tokyo, Japan).In addition, cDNA aliquots were amplified with SYBR Green PCR Master Mix (TaKaRa, Tokyo, Japan).GAPDH acted as an endogenous control.Table 1 presents the sequence of positive and antisense primers involved.

Identification of DEGs
The R limma package identified DEGs from the three datasets based on the filtering conditions.There were 22,189 DEGs in GSE30219, 8,563 of which were upregulated and 13,626 of which were downregulated.GSE149507 had 672 DEGs, of which 378 showed upregulation and 294 presented downregulation.GSE6044 has 8,537 DEGs, with 3,874 upregulated and 4,657 downregulated (Figure 2).By intersecting these DEGs using the Venn diagram, 362 overlapping DEGs were acquired, including 195 upregulated genes and 167 downregulated genes (Figure 3, Table 2).
Normalization has been performed before obtaining overlapping DEGs.

Biological function analysis and pathway enrichment analysis
GO analysis focused on "positive regulation of gene expression," "positive regulation of transcription from RNA polymerase II promoter," "cell division," and "negative regulation of transcription from RNA polymerase II promoter" for biological process (BP) annotation.Moreover, it was abundant in the "extracellular space," "extracellular region," "plasma membrane," "cytoplasm," "nucleus," and "nucleoplasm," according to the cellular component (CC) annotation.In molecular function (MF) annotation, "protein binding," "identical protein binding," and "DNA binding" were clustered (Table 3).DEGs are primarily enriched in "Cell cycle," "Complement and coagulation cascades," and "Human T-cell leukemia virus 1 infection" in the KEGG pathway (Table 4).

Construction of protein network and selection of hub gene
Blue nodes represent downregulated genes, whereas orange nodes stand for upregulated genes (Figure 4a and b).A total of 156 genes or nodes and 1,420 edges were enriched in the network.NCAPG, BUB1B, TOP2A, CCNA2, NUSAP1, UBE2C, AURKB, RRM2, CDK1, and KIF11 were the top ten hub genes (Figure 4b).Besides, all the parameters were set by default in the CytoHubba.

Screening for existing drugs that target the ten genes
Ten hub genes were matched with existing drugs using the drug-gene interaction module in the QuartataWeb database.Only two genes, TOP2A and RRM2, were found and matched to ten estimated medicinal drugs (Teniposide, Etoposide, Daunorubicin, Doxorubicin, Amrubicin, Dactinomycin, Epirubicin, Idarubicin, Cladribine, and Gallium nitrate) (Table 5).Screening criteria were: drug-gene interaction cutoff p < 0.05 and support from previous literature (Table 6).

Validation of gene expression in SCLC
With the aim of verifying the expression levels of NCAPG, BUB1B, TOP2A, CCNA2, NUSAP1, UBE2C, AURKB, RRM2, CDK1, and KIF11, normal lung cell lines and SCLC cell lines were selected.Additionally, the qRT-PCR assays were used with the purpose of quantifying the relative mRNA expression of the above genes in normal lung and SCLC cell lines.
Based on the obtained findings, the mRNA expressions of the above genes in SCLC cell lines were greater in relative to those in normal lung cell lines (P < 0.05, Figure 5).

Discussion
Oncologists still face significant difficulties in treating SCLC owing to their high mutation rates and other clinical limitations.Patients with SCLC have low survival rates.However, during the past few decades, research on novel therapeutic strategies for treating SCLC has been limited [40].Hence, there is an urgent need to identify target genes that can specifically and effectively target SCLC and thus correctly treat it.The development of high-throughput techniques and sophisticated computational tools has enabled the identification of relatively few genes that are characteristically deregulated in a given cancer cell among the thousands of normally expressed genes [41].These methods offer novel  approaches to diagnosis and treatment.Our study is the first to use bioinformatics to identify ten previously approved drugs that are related to RRM2 and TOP2A.Our findings might provide patients with SCLC new therapy options.
The current study analyzed GSE149507, GSE6044, and GSE30219 using the limma package and screened 195 upregulated and 167 downregulated genes.In the BP annotation in GO analysis, these genes were mostly enriched in "positive regulation of gene expression" and "cell division," which are closely related to gene expression."The DEGs in the CC category were mainly related to "extracellular space," "extracellular region," "plasma membrane," "nucleus," and  , Molecular Function.Term: Term is the basic unit of GO, and each term corresponds to a GO class name, which is an attribute.Counts: The total number of genes enriched in this Go term.Ratio: Ratio represents the proportion of genes enriched in the Go term to the total genes.P value: P value represents significance, and P value < 0.05 is considered to be statistically significant for genes enriched in this term.
Screening biomarkers in small-cell lung cancer  5 "nucleoplasm," and they showed close relationship to the extracellular and nuclear microenvironment.Additionally, the DEGs in the MF class were enriched for "protein binding" and "identical protein binding" terms and showed tight association with protein synthesis.The findings of GO analysis demonstrated that SCLC may play a pathogenic role through gene expression and translation in cells.DEGs were primarily enriched in cell cycle-related pathways including "Cell cycle" and "Complement and coagulation cascades" in the KEGG pathway.According to recent research, the cell cycle pathway makes a vital impact on developing SCLC [42].Our results are in consistence with those of earlier research.
In the PPI network, 156 genes or nodes and 1,420 edges were enriched.We chose the top ten hub genes with the CytoHubba plugin in the Cytoscape software.This suggests that overexpression of NCAPG, BUB1B, TOP2A, CCNA2, NUSAP1, UBE2C, AURKB, RRM2, CDK1, and KIF11 may promote SCLC progression.
Additionally, to assess the level of ten hub genes in SCLC cells and healthy human lung cells, the validation of the qRT-PCR assay was performed.Moreover, the obtained findings demonstrated that similar gene expression trends of Counts: The total number of genes enriched in the pathway.Ratio: The proportion of genes enriched in the pathway to the total genes.P value: Significance enriched to pathways.P value < 0.05 was considered to be statistically significant for genes enriched in the pathway.From large to small, the degree of enrichment becomes more and more significant.TOP2A produces polyisomerase II (TOPII), a crucial enzyme that alters the DNA topology by joining two doublestranded DNA molecules.TOPII is essential for gene transcription and replication.The aberrant expression of TOP2A can be related to poor prognosis in the lung, esophageal, breast, ovarian, and oral cancers [43].A previous study showed that TOP2A is engaged in the occurrence and development of SCLC through inhibiting ectopic expression of miR-27a-5p and miR-34b-3p [44].Therefore, TOP2A may show close relationship to the occurrence and prognosis of SCLC through comprehensive analysis.
AURKB, a serine/threonine protein kinase, is a crucial mitotic regulator.The oncogenic properties of AURKB have been studied in various tumors [45].According to a recent study, SCLCs lacking the RB1 tumor suppressor gene are overly relied on Aurora B kinase for survival.Patients with SCLC typically have RB1 gene mutations.Furthermore, the study found that Aurora B kinase exerts a role in suppressing tumor cell growth in multiple SCLC models [46].BUB1B, a member of the spindle assembly checkpoint protein family, is necessary for the anaphase of mitosis.Multiple research works have confirmed that abnormal BUB1B expression is related to tumor prognosis [47].A large-scale analysis of the transcriptional profile of NSCLC suggested that BUB1B is a hub gene in adenocarcinoma (ADC, lung adenocarcinomas) [48].Thus, BUB1B is a promising candidate gene.
The cell cycle regulator Cyclin-A2 (CCNA2) regulates mitotic G1/S and G2/M phases [49].The occurrence and development of tumors may be caused by impaired regulation of this process [50].In addition, CCNA2 is abnormally expressed in other tumors [51].
RRM2, the ribonucleoside-diphosphate reductase subunit M2B, has been identified as a gene with poor survival prognosis through network analysis and multivariate prognostic analysis in patients with LUAD [52].Bioinformatics analysis by Chen et al. [53] identified RRM2 as the hub gene for SCLC.
UBE2C, a cell cycle-regulated ubiquitin ligase, regulates mitosis.Some researchers have reported that UBE2C shows close relationship to tumor occurrence, proliferation, and other behaviors [54].Additionally, Wang et al. [55] discovered that UBE2C is tightly correlated with angiogenesis in NSCLC, confirming the speculation of previous studies.
Cyclin-dependent kinase 1 (CDK1) binds to cyclin B1 (CCNB1) or cyclin B2 (CCNB2) to form a complex that regulates the mitotic initiation process.Its dysregulation has been indicated to correlate with tumor cell proliferation [56].A bioinformatics study revealed that CDK1 stimulates the stemness of lung cancer cells by the interaction with SOX2 and that increased CDK1 expression shows relationship to lower overall survival in patients suffering from lung cancer.Therefore, CDK1 may play the role of a potential biomarker [57].
The current work used bioinformatics methods to screen FDA-approved drugs and reposition them as new anticancer drugs.Our study showed that TOP2A and RRM2 matched predicted FDA-approved drugs.
The TOP2A gene matched with eight drugs, and they are adopted for cancer therapy, among the matches between  Screening biomarkers in small-cell lung cancer  7 the input hub genes and the selected drugs.Etoposide, a semisynthetic derivative of podophyllotoxin with antitumor activity, was chosen as the first-line chemotherapy for SCLC among these drugs [31,32,62].Teniposide refers to a cytotoxic drug used to treat refractory childhood acute lymphoblastic leukemia [63].Epirubicin is an anthracycline antineoplastic drug used as adjuvant therapy after primary breast cancer resection [64].Idarubicin is also an anthracycline antineoplastic drug, and its indications are adult acute myeloid leukemia [65].Doxorubicin is an anthracycline antibiotic that is cytotoxic.It has a wide range of indications and can be used to treat various cancers [66].Valrubicin is a chemotherapeutic drug which can be adopted for treating bladder cancer [67].Daunorubicin is an anthracycline aminoglycoside antitumor drug used to induce remission in adults with acute non-lymphocytic leukemia and children and adults with acute lymphoblastic leukemia [68].Amrubicin, an anthracycline, is currently being studied for SCLC treatment [36].
RRM2 matches only the two FDA-approved drugs.Cladribine is a purine analog and antineoplastic agent used to treat adults with highly active relapsing multiple sclerosis [69].Gallium nitrate is used to treat cancer-related hypercalcemia and non-Hodgkin lymphoma [70].
In conclusion, NCAPG, BUB1B, TOP2A, CCNA2, NUSAP1, UBE2C, AURKB, RRM2, CDK1, and KIF11 are potential markers for diagnosing and treating SCLC.Additionally, we selected and constructed two genes, TOP2A and RRM2, as well as their potential related drugs to offer novel ideas for treating SCLC.Moreover, our experiments were subject to significant bias.Our shortcoming was that we did not validate this through relevant experiments.Therefore, these drugs require validation using relevant experimental models.

Figure 1 :
Figure 1: Workflow chart of integrative bioinformatics in this study.

Figure 2 :
Figure 2: Volcano plot of differentially expressed genes between SCLC tissues and normal lung tissues in datasets GSE6044, GSE30219, and GSE149507.Red denotes genes with high expression in tumor tissues, and blue stands for low expression in tumor tissues.(a) GSE6044; (b) GSE30219; and (c) GSE149507.

Figure 4 :
Figure 4: The construction of PPI network and significant gene modules analysis.(a) The PPI networks of differentially expressed genes and (b) the top ten genes in the PPI networks.The orange nodes represented upregulated genes, while the blue ones represented downregulated genes.
10 hub genes in SCLC cells and normal human lung cells were demonstrated by qPCR, verifying the accuracy of our findings.

Table 1 :
Primers used for qRT-PCR

Table 4 :
KEGG pathway enrichment analysis Term: Category of the pathway in KEGG pathway enrichment analysis.

Table 5 :
Significant drugs targeting hub genes

Table 6 :
Publications related to the effective drugs targeted hub genes